Skip to content

Conversation

@tiran
Copy link
Collaborator

@tiran tiran commented Sep 14, 2025

Convert the DependencyNode and DependencyEdge into data classes. The classes are frozen (immutable) and support comparison, sorting, and hashing.

Extend DependencyNode to get all install dependencies and build requirements. The new method return unique dependencies by recursively walking the dependency graph. The build requirements include all recursive installation requirements of build requirements.

@tiran
Copy link
Collaborator Author

tiran commented Sep 22, 2025

@dhellmann This MR implements the logic to get build and install requirements of a DependencyNode.

Extend `DependencyNode` to get all install dependencies and build
requirements. The new method return unique dependencies by recursively
walking the dependency graph. The build requirements include all
recursive installation requirements of build requirements.

Signed-off-by: Christian Heimes <[email protected]>
c.add_child(f, Requirement(f.canonicalized_name), RequirementType.BUILD_BACKEND)

assert sorted(a.iter_install_requirements()) == [b, e]
assert sorted(a.iter_build_requirements()) == [c, e, f]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to see D in this list because D is a build requirement of C. We can't build A until we can install C, and we can't install C until we can build it.

Like I think I said on another review, we have the same issue here that we had with determining whether something was a build dependency during bootstrap. There, the package "inherits" the "build dependency" type from the why chain. If anything earlier in the why chain is a build dependency, then so is the current package.

In the bootstrap process we build packages using in-order traversal. We build the build dependencies, then the package, then its install dependencies. That separates the dependency types and allows us to cope with potential circular dependencies in installation dependencies, since those are not forbidden.

I think we need to apply the same logic when building in parallel. That is going to potentially limit the amount we do in parallel, but I think that's required to achieve a correct build, without deadlocking on cyclic dependencies or trying to use something that isn't available.

I think that means keeping track of 2 things in build-parallel for each package: Is the package ready to be built, and is the package ready to be used to build other packages. That will complicate the logic for deciding what work to do, and while a literal topological traversal will give us the right order for sequential build I don't think it gives us parallel builds on its own.

The current implementation in the main branch doesn't attempt to traverse the graph explicitly. During each pass it iterates over the set of all unbuilt nodes and asks if the build dependencies for the node have been built. If so, it adds the current node to the set to be built. I think we need to extend that logic so that for each build dependency we check not only that it has been built, but that its installation dependencies are built. Only at that point can we say that the build dependencies are ready to be used to build the current node.

We could build that using the topological sorter by having the sorter tell us when a package is ready to be built, but then keeping those built nodes in a separate list and waiting to mark them as done in the graph when the installation dependencies are also built.

Alternatively, we could fix the loop we have already to keep track of the same information.

I'm not sure which will be easier to follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done some refactoring of the existing logic and added install dependency checking in #794

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to see D in this list because D is a build requirement of C. We can't build A until we can install C, and we can't install C until we can build it.

No, A does not need D to build. A only needs C+E+F to build. The indirect dependencies A -> C -> D is handled by the topology graph. C does not become available until D is built. Because C has a dependency on D and A has a dependency on C, the topology converges on the solution D, then C, then A.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants